Analysis of gradient descent methods with non-diminishing, bounded errors

نویسندگان

  • Arunselvan Ramaswamy
  • Shalabh Bhatnagar
چکیده

Implementations of stochastic gradient search algorithms such as back propagation typically rely on finite difference (FD) approximation methods. These methods are used to approximate the objective function gradient in steepest descent algorithms as well as the gradient and Hessian inverse in Newton based schemes. The convergence analyses of such schemes critically require that perturbation parameters in the estimators of the gradient/Hessian approach zero. However, in practice, the perturbation parameter is often held fixed to a ‘small’ constant resulting in constant-error estimates. We present in this paper a theoretical framework based on set-valued dynamical systems to analyze the aforementioned. Easily verifiable conditions are presented for stability and convergence when using such FD estimators for the gradient/Hessian. In addition, our framework dispenses with a critical restriction on the stepsizes (learning rate) when using FD estimators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extensions of the Hestenes-Stiefel and Polak-Ribiere-Polyak conjugate gradient methods with sufficient descent property

Using search directions of a recent class of three--term conjugate gradient methods, modified versions of the Hestenes-Stiefel and Polak-Ribiere-Polyak methods are proposed which satisfy the sufficient descent condition. The methods are shown to be globally convergent when the line search fulfills the (strong) Wolfe conditions. Numerical experiments are done on a set of CUTEr unconstrained opti...

متن کامل

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation. The algorithms we analyze include: (i) two basic forms of two-time-scale gradient-based TD algorithms,...

متن کامل

Accelerating Stochastic Gradient Descent

There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical errors (Paige, 1971; Proakis, 1974; Polyak, 1987; G...

متن کامل

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

Nonlinear Adaptive Control for Electromagnetic Actuators

The authors study here the problem of adaptive ”soft-landing” control for electromagnetic actuators. The soft landing requires accurate control of the actuator’s moving element between two desired positions. They propose a non-linear adaptive controller to solve the problem of robust trajectory tracking for the moving element, when considering model uncertainties with linear parametrisation. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016